Do you need to have a math Ph.D to become a data scientist? Absolutely not! This guide will show you how to learn math for data science and machine learning without taking slow, expensive courses.
How much math you’ll do on a daily basis as a data scientist varies a lot depending on your role. Keep reading to find out which concepts you’ll need to master to succeed for your goals.
To complete this guide, you’ll need at least basic Python* programming skills. We’ll be learning math in an applied, hands-on way.
Check out our guide, How to Learn Python for Data Science, The Self-Starter Way, for the fastest way to get up to speed with Python. We recommend at least completing up to Step 2 in that guide.
*note: other languages are fine too, but the examples will be in Python.
Math Needed for Data Science
The amount of math you’ll need depends on the role. First, every data scientist needs to know some statistics and probability theory. We have a guide for that:
What about other types of math? Well, here’s where the answer is more nuanced… it depends on how much original machine learning research you’ll be doing.
Application-Heavy Machine Learning Positions
In practice, especially in entry-level roles, you’ll often be using out-of-the-box ML implementations. There are robust libraries of common libraries in many programming languages. You don’t need to reinvent the wheel.
Even so, interviewers may still test your basic linear algebra and multivariable calculus. Why do they do this?
Well, at some point, your team may still need to build custom implementations of ML algorithms. For example, you may need to adapt one to your tech stack or to expand its base functionality. To do so, you must be able to peel back ML algorithms and work with their innards.
R&D-Heavy Machine Learning Positions
Other roles need much more original ML research and development. You may need to translate algorithms from academic papers into working code. Or, you might research enhancements based on your business’s unique challenges.
In other words, you’ll be implementing algorithms from scratch much more often.
For these positions, mastery of both linear algebra and multivariable calculus is a must.
The Best Way to Learn Math for Data Science
The self-starter way to learning math for data science is to learn by “doing shit.” So we’re going to tackle linear algebra and calculus by using them in real algorithms!
Even so, you’ll want to learn or review the underlying theory up front. You don’t need to read a whole textbook, but you’ll want to learn the key concepts first.
Here are the 3 steps to learning the math required for data science and machine learning:
- Linear Algebra for Data Science – Matrix algebra and eigenvalues.
- Calculus for Data Science – Derivatives and gradients.
- Gradient Descent from Scratch – Implement a simple neural network from scratch.
Step 1: Linear Algebra for Data Science
Many machine learning concepts are tied to linear algebra. For example, PCA requires eigenvalues and regression requires matrix multiplication.
Also, most ML applications deal with high dimensional data (data with many variables). This type of data is best represented by matrices.
Here are a few of the best free resources we’ve found for learning linear algebra for data science.
For application-heavy roles:
- Khan Academy has short, practical linear algebra lessons. They cover the most important topics.
For R&D-heavy roles:
- MIT OpenCourseWare offers a rigorous linear algebra class. The video lectures and course materials are all included.
And if you only need to review:
- Linear Algebra Review for Machine Learning (Video Series) – These are the optional linear algebra review videos for Andrew Ng’s machine learning course. The entire 6-part series (3.1 to 3.6) can be watched in under 1 hour. Recommended if you’ve taken linear algebra before and just need a quick review.
- The Matrix Cookbook (PDF) – Excellent reference resource for matrix algebra.
Step 2: Calculus for Data Science
Calculus is important for several key ML applications. For example. you’ll need to be able to calculate derivatives and gradients for optimization. In fact, one of the most common optimization techniques is gradient descent.
Here are some of the best resources for learning calculus for data science.
For application-heavy roles:
- Khan Academy has short, practical multivariable calculus lessons. They cover the most important concepts.
For R&D-heavy roles:
- MIT OpenCourseWare offers a rigorous multivariable calculus class. The video lectures and course materials are all included.
And if you only need to review:
- Multivariable Calculus Review (Video) – This is quick review of multivariable calculus in the format of solving practice problems. Recommended if you’ve taken multivariable calculus before and just need a quick review.
Step 3: Simple Neural Network from Scratch
Congratulations! You’ve got the theory out of the way. Now it’s time for the really fun part.
One of the best ways to learn math for data science and machine learning is to build a simple neural network from scratch.
You’ll use linear algebra to represent the network and calculus to optimize it. Specifically, you’ll code up gradient descent from scratch.
[images style=”0″ image=”https%3A%2F%2Fdatonauts.com%2Fwp-content%2Fuploads%2F2016%2F10%2Fgradient-ascent.jpg” width=”640″ align=”center” top_margin=”0″ alt_text=”Learn Math for Data Science” full_width=”Y”]
Don’t worry too much about the nuances of neural networks for now. It’s ok if you’re just following instructions and writing code. We’ll cover machine learning in depth in another guide, as this is for targeted math practice.
Follow along with the tutorials, and review theory as you go along. Plus, you’ll have a cool project to add to your portfolio afterward.
Here are a few awesome step-by-step guides:
- Neural Network in Python, Part 2 – This is an incredible tutorial that takes you through a simple neural network from end to end. It’s packed with helpful illustrations, and you’ll learn about how gradient descent fits in.
- Neural Nets to Recognize Handwritten Digits – We love this resource! This is a free online book that walks you through a famous application of neural networks. It explains ideas very intuitively, and it’s the most in-depth tutorial in this list.
- Implementing a Neural Network from Scratch – A shorter tutorial that also takes you through step-by-step.